-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: refactor Series.reindex to core/generic #4610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @jtratner I had to create a core.index.identical method (similar to .equals) but also compares attributes (e.g. name) we don't have anything like this, right? do you know python calls for |
There's an issue for that isn't there? Comparing indexes including meta data |
prob is |
could only find #3317, with a just passing mention....should i open a new issue for it? really just a decision....i remember preserving |
Example of lst = [1, 2, 3]
lst2 = list(lst)
assert not (lst is lst2) You could potentially add a I'm not totally clear on what you're looking for here. I don't think it's a good idea to have identical test for object identity (though, granted, there is some overhead with views of MultiIndex, but not much [approx space complexity O(n+m) where n and m are length of the set of levels and length of the set of labels]). |
I'm wrong on space complexity of MI view - it's whatever space python needs to make a list pointing to all the lists of levels and a list pointing to all the lists of labels. (so a MI with 2 sets of levels and 2 lists of labels costs two lists of length 2 to shallow copy) very minimal. |
its just a very simple issue if you are reindexing with an index that is identical (and copy is not set), then just return the object but this HAS to account for an equals index, but with a different name for example |
BUG/CLN: (GH4604) Refactor Series.reindex to core/generic.py allow method= in reindexing on a Series to work API/CLN: GH4604 Infer and downcast dtype if appropriate on ffill/bfill this is for consistency when doing: df.reindex().ffill() and df.reindex(method='ffill') CLN: allow backfill/pad/interpolate to operate on integers (by float conversion) provide downcasting back to original dtype where needed core.internals.interpolate ENH: provide core.index.identical method to compare values and attributes similar to .equals API: changed back to pre-GH3482 where a reindex with no args will by default copy
makes sense?
|
Does |
yep freq/tz on datetime index, freq on period |
on MI, levels are already tested (and I think labels) via equals, so it just tests name, |
👍 |
what if so all current code will work, but then true equality is tested by |
that will break tests... plus that might be kind of a nasty surprise to people who don't religiously keep track of |
fair enough... |
for the record i would like that, but too much back-incompat. would actually fix #3317 i believe |
Oh okay, I wasn't totally clear on your question. Looks good :) |
Similar to equals, but check that other comparable attributes are also equal | ||
""" | ||
return self.equals(other) and all( | ||
[ getattr(self,c,None) == getattr(other,c,None) for c in self._comparables ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor point: if the attr checks are moved to the left hand side and u use a generator in all
best case will be faster. totally minor though bc len(_comparables) == 1
! u will return False
faster than having to check the whole array if attrs are in fact not equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that...but for the most part the name attributes are usually None IME
Will look over in a minute. |
@jreback this doesn't quite work for MultiIndex yet. Is it supposed to? E.g. mi = pd.util.testing.makeCustomDataframe(5, 5).stack().index
assert mi.identical(mi)
other = mi.from_tuples(tuple(mi))
assert not mi.identical(other), "Failed - other does not have same names" On a related note - for some reason, MultIIndex appears to have a In [16]: print(mi.name)
None
In [17]: print(mi.names)
FrozenList([u'R0', u'C0'])
In [18]: print(other.name)
None
In [19]: print(other.names)
FrozenList([None, None]) |
Ah, it's because |
I think there is prob some compatibility going on here iow you need to know whether an object is an index or mi and then use name or names |
@jtratner side issue - |
@jreback the real problem is that rename and name should not exist for MultiIndex (they only make sense for |
That said, why can't you just set |
? |
@jtratner oh I did do that, I was trying to use |
@jreback You're right, that's definitely broken. |
@jreback I'll put up a fix tomorrow. Sorry about that! |
no...thank you for testing.....when I find something breaking...I realized that I didn't test it! |
@jreback however, your example doesn't actually fail: In [1]: from pandas import *
ind
In [2]: ind = Index(range(10))
In [3]: ind.name
In [4]: ind.rename('apple')
Out[4]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)
In [5]: ind = _
In [6]: ind.name
Out[6]: 'apple' |
The issue is that |
And it works for MultiIndex too...can you show what you mean by failing? In [8]: import pandas as pd
In [9]: mi = pd.util.testing.makeCustomDataframe(5, 5).stack().index
In [10]: mi
Out[10]:
MultiIndex
[(u'R_l0_g0', u'C_l0_g0'), (u'R_l0_g0', u'C_l0_g1'), (u'R_l0_g0', u'C_l0_g2'), (u'R_l0_g0', u'C_l0_g3'), (u'R_l0_g0', u'C_l0_g4'), (u'R_l0_g1', u'C_l0_g0'), (u'R_l0_g1', u'C_l0_g1'), (u'R_l0_g1', u'C_l0_g2'), (u'R_l0_g1', u'C_l0_g3'), (u'R_l0_g1', u'C_l0_g4'), (u'R_l0_g2', u'C_l0_g0'), (u'R_l0_g2', u'C_l0_g1'), (u'R_l0_g2', u'C_l0_g2'), (u'R_l0_g2', u'C_l0_g3'), (u'R_l0_g2', u'C_l0_g4'), (u'R_l0_g3', u'C_l0_g0'), (u'R_l0_g3', u'C_l0_g1'), (u'R_l0_g3', u'C_l0_g2'), (u'R_l0_g3', u'C_l0_g3'), (u'R_l0_g3', u'C_l0_g4'), (u'R_l0_g4', u'C_l0_g0'), (u'R_l0_g4', u'C_l0_g1'), (u'R_l0_g4', u'C_l0_g2'), (u'R_l0_g4', u'C_l0_g3'), (u'R_l0_g4', u'C_l0_g4')]
In [11]: mi.set_names(["foo", "bar"])
Out[11]:
MultiIndex
[(u'R_l0_g0', u'C_l0_g0'), (u'R_l0_g0', u'C_l0_g1'), (u'R_l0_g0', u'C_l0_g2'), (u'R_l0_g0', u'C_l0_g3'), (u'R_l0_g0', u'C_l0_g4'), (u'R_l0_g1', u'C_l0_g0'), (u'R_l0_g1', u'C_l0_g1'), (u'R_l0_g1', u'C_l0_g2'), (u'R_l0_g1', u'C_l0_g3'), (u'R_l0_g1', u'C_l0_g4'), (u'R_l0_g2', u'C_l0_g0'), (u'R_l0_g2', u'C_l0_g1'), (u'R_l0_g2', u'C_l0_g2'), (u'R_l0_g2', u'C_l0_g3'), (u'R_l0_g2', u'C_l0_g4'), (u'R_l0_g3', u'C_l0_g0'), (u'R_l0_g3', u'C_l0_g1'), (u'R_l0_g3', u'C_l0_g2'), (u'R_l0_g3', u'C_l0_g3'), (u'R_l0_g3', u'C_l0_g4'), (u'R_l0_g4', u'C_l0_g0'), (u'R_l0_g4', u'C_l0_g1'), (u'R_l0_g4', u'C_l0_g2'), (u'R_l0_g4', u'C_l0_g3'), (u'R_l0_g4', u'C_l0_g4')]
In [12]: _.names
Out[12]: FrozenList([u'foo', u'bar']) |
ahh...sorry...what I meant was that These all should yield the same (I know I am abusing the input), but I am user of your functions!
|
|
@jreback okay, well, trivial to enable rename is basically just |
@jtratner is |
R users are probably familiar with |
@jreback It would be for |
@cpcloud heh, I introduced rename a few PRs ago so I'm not sure that it's compliant, would be good to check that though. |
@cpcloud uh, I think you're thinking of dataframe/series here. Having a plyr-like rename would be kinda weird on Index. |
yep ur right, carry on good sirs |
Thing is the what is plyr rename do? (is it not the obvious thing?)....like (attach)?? |
@jreback ha! no....it's like |
@cpcloud oh...that's not so obvious |
here's what
but that's OT |
@jreback you can set names on index too: ind.names = ['apple']
ind.name # 'apple' I'd say the behavior that makes sense is something like:
So if you want to do something generically, you should use |
@jreback well, I'll alias |
@jtratner that all looks good |
…x if freq is not specified during _ensure_index PERF: remove automatic downcasting, instead do on-demand via 'downcast=infer' TST: fix up failing tests CLN: reindex/fill in internals moved to core/internals/Block/reindex_items_from PERF: don't automatically downcast with a float block BUG: GH3317 reverse prior fix in tseries/offsets, to change slightly the multi reindex TST: identical index tests BUG: GH4618 provide automatic downcasting on a reindexed with method Series (in this case a shifted boolean then filled series)
CLN: refactor Series.reindex to core/generic
@jreback to be clear - the only 'bug' with rename now is that MultiIndex should set |
I think that is right |
closes #4604, #3317, #4618
Incorrect results
Fixed (and happens to automatically cast as well back to the original dtype)
(this was broken in 0.12)
This does NOT automatically downcast here as this is a major
performance bottleneck if we did (this is the same result as in 0.12)
You can force downcasting by passing 'infer' if you want
(this option was not available in 0.12 for Series)
TST: GH4604, reindexing with a method of 'ffill' gives incorrect results
BUG/CLN: (GH4604) Refactor Series.reindex to core/generic.py allow method= in reindexing
on a Series to work
API/CLN: GH4604 Infer and downcast dtype if appropriate on ffill/bfill
this is for consistency when doing: df.reindex().ffill() and df.reindex(method='ffill')
CLN: allow backfill/pad/interpolate to operate on integers (by float conversion)
provide downcasting back to original dtype where needed core.internals.interpolate
ENH: provide core.index.identical method to compare values and attributes similar to .equals
API: changed back to pre-GH3482 where a reindex with no args will by default copy